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PROBLEM 


Develop statistical, physical, and computer techniques for interpreting, 
summarizing, and extrapolating oceanic and meteorologic data for reliable 
estimation of the sound velocity distribution in the ocean. Specifically, perform 
autocorrelation, regression, and trend analyses of six time-series of daily sea- 
surface temperatures. Compensate for missing data in the time-series. Examine 
the randomness of mean annual temperatures, and of amplitudes and phases 
descriptive of annual variations in temperature. 


RESULTS 


Analysis of records of sea-surface temperatures taken in the North 
Atlantic and North Pacific and up to 40 years in length has led to the following 
conclusions: 


1. The autocorrelation statistics indicate the existence of an oscilla- 
tory function with period 1 year in the records and, for most stations, an oscil- 
latory function with period 0.5 year. There is no evidence of functions with 
shorter periods. 


2. For all stations considered, a regression model containing annual 
and semiannual oscillatory terms (sines and cosines) provides a good statistical 
fit to the observed daily temperatures. The analysis compensates for missing 
data. 


3. No trends exist in the sequences of annual mean temperatures, or in 


the sequences of amplitudes and phases describing the regression functions. 
However, there are significant differences among the annual mean temperatures. 
The behavior of the annual means, amplitudes, and phases is typical of random 
statistical variables. 


RECOMMENDATIONS 


Examine the results of this report from an oceanographic point of view. 
Specifically, answer the following questions: 
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1. What is the length of time-series necessary to produce reliable 
long-term estimates of sea-surface temperatures? 


2. Can it be expected that sea-surface temperatures will vary system- 
atically over periods of several years, in light of the randomness demonstrated 
in this report? 
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INTRODUCTION 


This study is the third in a series of studies concerned with the analysis 
of sea-surface temperature observations. The first study dealt with the effect of 
missing data in long time-series of sea-surface temperature measurements on 
certain regression and autocorrelation analyses.! The second study examined the 
use of regression models for time-space interpolation of sea-surface temperature 
observations. 

This study presents the results of an autocorrelation, regression, and 
trend analysis of time-series of sea-surface temperature measurements made at 
six locations representing different oceanographic conditions, and considers the 
difficulties encountered in applying these techniques to oceanographic data 
samples.? An oceanographic interpretation of the statistical results will be pre- 
sented in a later study. 

Many time-series measurements have been made at various locations. In 
the eastern Pacific Ocean such measurements have been made by Canadian and 
American oceanographers at coastal, island, and ship locations for time periods 
up to 45 years. These data have been the subject of numerous papers including, 
among others, those of Pickard and McLeod? and Roden.*’* This study differs 
from those cited in that the original daily temperatures are used in the analysis 
without a preliminary smoothing by monthly averaging. 

The purpose of time-series analysis is to isolate trend, oscillation, and 
random elements, which are defined as follows. Trend is a gradual increase or 
decrease in a system over a long period of time; an oscillation is a variation 
about the trend that occurs with more or less regularity over some time interval; 
and a random element is an unpredictable variation in the variable. If long-term 
trend does not exist, then the primary need is the statistical fitting of some 
function to time-series to represent the oscillatory element. 

Several sets of daily sea-surface temperatures have been examined, 
covering two open ocean and four island or coastal locations (fig. 1). To indicate 
how individual temperature measurements vary throughout the year, 1 year of 
measurements for each location is presented in figure 2. These years of tempera- 
tures are taken from records that vary in length from 7 to 40 years. Pertinent 
information about the stations yielding these records is summarized in table 1. 

The data for station PAPA are being collected by Canadian oceanogra- 
phers of the Pacific Oceanographic Group, Nanaimo, British Columbia, and are 
available as a sequence of data reports. Bathythermograph observations are made 
at 0200 and 1700 GMT. The 0200 GMT data were used for this analysis if the 
ship were within a 10-minute rectangular area centered at the nominal position. 


‘Superscript numbers denote references in the list at the end of this report. 


TABLE 1. LOCATION OF SEA-SURFACE TEMPERATURE TIME-SERIES 


Tenn imepened Number | Number Daily | Percent Possible 
ca an see ces Days Observations | Daily Observations 


Weather Ship PAPA | 1/56- 8/62 2409 1595 66 
5O°N 145°W 6 yr 7 mo 
North Pacific 


— 

Weather Ship ECHO |9/49-9/56 2557 1533 60 
35°N 48°W 7 yr 
North Atlantic 


Cape St. James 1/35 - 1/61 7671 6180 81 


52°N 131°W 21 yr (5 yr 
North Pacific | missing) 


Triple Island 1/40- 1/61 
54°N 131°W 21 yr 
North Pacific 


Langara Island 1/41-1/61 7304 6402 88 
54°N 133°W 20 yr 
North Pacific 


Scripps Pier 1/21-1/61 14610 14352 98 
33°N 117°W 40 yr 
North Pacific 


If the 0200 GMT data were not available, other data for a given day were used. 
The station ECHO data are also taken by bathythermograph. The temperature 
measurements are probably not as accurate as the PAPA data. The Cape St. 
James, Triple Island, and Langara Island data were taken by lighthouse keepers, 
and were made within the hour prior to daytime high tide. The Scripps Pier data 
were collected by the Scripps Institution of Oceanography. 

A subjective examination of the data shows, for the PAPA, ECHO, Cape 
St. James, and Triple Island observations, maximums in August or September and 
rather flat minimums in February to May. The Langara Island and Scripps Pier 
data show a more regular sinusoidal seasonal variation. At all stations there 
appear oscillations with a duration of a few days to a few weeks at irregular in- 
tervals. At the open-ocean locations these shorter-period oscillations occur less 
frequently and their magnitude is smaller than at coastal locations. It is recog- 
nized that there exists in the data another oscillation — a diurnal variability. 
The amplitude of this diurnal oscillation is much smaller than that of the season- 
al oscillation and will not be examined in this study. However, it must be 
recognized as a factor in the overall variability. 
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Figure 2. Sea-surface temperatures as a function of 
time for selected years of data. 
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Figure 2. (Continued) 
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AUTOCORRELATION ANALYSIS 


A visual observation of the data suggests statistically fitting some 
theoretical function which oscillates with period 1 year. Further justification is 
provided by the autocorrelation function: 


C, = COV (TT; ,p)/ [VAR (T;) VAR (T;.»)] *, for lags k= 0, 1, 2... 


The variable T,; is the sea-surface temperature on day 7, T; , ;, is the tempera- 
ture k days later, and COV and VAR are the covariance and variance of the 
variables as indicated. 

Figures 3, 4, and 5 present the results of an autocorrelation analysis for 
the six time-series. The upper figure for each station is the autocorrelation 
function of the daily temperatures. The peaks in the autocorrelation functions 
have magnitudes and spacings indicating so strongly the existence of an annual 
oscillation in the time-series that any statistical test of hypothesis is super- 
fluous. 

The middle set of figures presents the autocorrelations of the residuals 
(or anomalies) after removing the 12-month oscillatory terms from the original 
data (discussed in the next section). An obvious feature of these figures is the 
peaks and troughs in the autocorrelation function at intervals of 6 months for 
PAPA, ECHO, Cape St. James, and Triple Island, and the lack of this oscilla- 
tion for Langara Island and Scripps Pier. 

The lower set of figures presents the autocorrelation of the residuals 
after the annual and semiannual oscillatory terms are removed. The autocorre- 
lation function has a form typical for such residuals, decreasing as a negative 
exponential function for small lags. 

The autocorrelation function for Scripps Pier was computed out to a lag 
of 1800 days, an arbitrary figure slightly over 10 percent of the total sample 
length. Since conclusions based only on sampling variability of the autocorrela— 
tion function must be avoided, values of the function significantly different from 
zero at some probability level are of primary interest. If the standard deviation 
of the autocorrelation coefficients were known, and normality assumptions made, 
then significance levels could be determined. The Scripps Pier autocorrelation 
coefficients with lags from 400 to 1800 days provide an estimate of o, , the 
standard deviation of the nonsignificant correlations. The estimate is based on 
a large sample of the autocorrelation coefficients, and the maximum lag involved 
is still only a small fraction of the total time-series length. The PAPA and 
ECHO series are much too short to supply such an estimate, and the records for 
the other stations are of marginal length for this purpose. 

This standard deviation is g, = 0.0293. The 95 percent significance levels for 
Seripps Pier are + 1.96 x 0.0293 = +0.0574, for a nominal series length of 13140. 


0.5 


WEATHER SHIP “PAPA” (6 YRS. 7 MO. OF DATA) 


AUTOCORRELATION COEFFICIENT 


ORIGINAL SURFACE TEMPERATURE DATA 
‘ ~ oe 
ee 
~ 200 = 400 & 600 800i | DAS) 
wy a 
) e 


12-MONTH PERIOD REMOVED 


-1.0 


6-MONTH PERIOD REMOVED 


| 
18 MO 24 MO 


12 MO 


6 MO 


Figure 3. Autocorrelation coefficients for PAPA and ECHO. 
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Figure 4. 
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Autocorrelation coefficients for Cape St. James 
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Figure 5. Autocorrelation coefficients for Langara Island 
and Scripps Pier. 
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With some caution, this estimate is used for each location. The 95 percent 
significance levels are displayed as the dashed lines in figures 3, 4, and 
5. The limits are adjusted for the particular series length involved and diverge 
with increasing lag, since the corresponding sample size decreases. The many 
oscillations in the autocorrelation functions beyond 100 to 200 days’ lag are not 
Significant for the lower curves of figures 3, 4 and 5. The oscillations are 
merely characteristics of the particular samples of time-series available, and it is 
useless to subject them to any additional correlation or spectral analysis. 

Of interest is the comparison of the above standard deviation with that 
resulting from a sometimes used inequality concerning the true variance of the 
autocorrelation coefficient.* This inequality is 


Go < fe pr)dr 


where p(z) is the true autocorrelation function, and T is the sample length. If the 
Scripps Pier empirical autocorrelation function is numerically integrated out to a 
lag of 350 days, and if the function is assumed to be zero beyond that lag, an 
estimate of the inequality o,? < 0.004424 is obtained. | Correspondingly, 


o- < 0.0665. The value o, = 0.0293 easily satisfies the inequality. 


REGRESSION ANALYSIS 


The preceding autocorrelation analysis indicates that the surface-temper- 
ature time-series contains a prominent oscillatory term with a period of 12 months, 
and that four of the time-series contain an additional oscillatory term with a 
period of 6 months. 


A general model containing oscillatory functions is 


k 
IW [hy 4 » a; sin [271 (D-0;) /365 |] +, or expanding, (1) 
f= il 


k 
= Bot » [arn sin (271D/365) + Bi cos (27iD/365)] +€ (2) 
f= il 
where D is time measured in days from some arbitrary origin, and T’’ is the fitted 
value of the surface temperature. Fitting the function of equation (2) to the 
observed surface temperatures T using the method of least squares yields esti- 
mates of the regression coefficients 8 and an estimate of the variance of «. The 


amplitudes a and phases 6 can be obtained from the B’s. The quantity « is the 
random error or residual term. 

For k = 1, equation (2) was fitted to each of the six sets of surface 
temperature data. The results are shown in table 2. 


TABLE 2. HARMONIC ANALYSIS OF ANNUAL OSCILLATION 


Location 


ECHO 


Cape St. James 


Triple Island 3.18 136.6 
Langara Island 2.90 139.2 
Scripps Pier 3.24 128.4 


Certain measures related to the statistical fit are given in table 3. The 
quantity R is the multiple correlation coefficient; ¢ and op are the standard 
deviations of the observations about their mean and about the fitted regression 
curve, respectively. The F'-ratio indicates whether or not the regression curve 


has significantly reduced the total sum of squares. 


TABLE 3. STATISTICAL FIT OF ANNUAL OSCILLATION 


Location 


Cape St. James 


Triple Island 


Langara Island 


Seripps Pier 


*Adjusted for nonrandom missing data 


Van Vliet! has shown that the variances of regression coefficients 
should be increased if there are nonrandom missing data. A similar situation 
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exists with the residual variances. For a fraction f of missing data, the frac- 
tional increase in regression coefficient variance attributable to nonrandom 
missing data is 


@)= 2f7(1=f) 
The corresponding fractional increase in the residual variance is 
(Q+ 1D) (1-f) -1l=f 


Locations PAPA, ECHO, Cape St. James, and Langara Island have nonrandom 
missing data. A correction for such data is reflected in the columns with the 
asterisks of table 3. The corrections yield more conservative estimates of 
Op and F. 

The analysis of variance leading to the F-ratio is as follows. Assume 
that there are N complete years of data. Since the sine and cosine functions 
yield an integral number of periods per year, the N years of data can be interpreted 
as | year of data with N observations per day. One aspect of this fact is that the 
5 years of data missing for Cape St. James, as noted in table 1, are not pertinent 
to the regression analysis. For simplicity, all years are assumed to have 365 
days. The total sum of squares about the sample mean can be partitioned as 


N 365 365 N 365 
» (fae) -v)> (mam 4 » » (Higa tg) 
= a ft = js 


Where I 5s is the observed temperature on the j-th day of the 7-th year, and Tes is 
the temperature predicted by equation (2) for the j-th day. For k = 1, the quantity 


Fo ee Se det ee 
» (Ca i *)7/(365N — 3) 
ES 

has the F-distribution with 2 and 365N-3 degrees of freedom. Assuming the 
F-test is robust with respect to missing data, the test is applied in the present 
situation with 365N replaced by the actual number of observations. 

The least squares method is valid if (1) the error between the true 
regression curve and the observed value is distributed independently of the 
independent variables with zero mean and constant variance; and (2) ideally, 
successive errors are distributed independently of one another. Actually, the 
problem of using the method of least squares when the error terms are auto- 
correlated has been solved if the e’s follow certain autoregressive processes.’ 

The residuals, after the annual terms are removed, constitute a time- 
series with another strongly oscillatory term. If the F'-ratios in table 3 were 


examined, this fact would at least necessitate reducing the degrees of freedom in 
the denominator, in turn reducing the actual F-ratio value. In this case a de- 
tailed examination of these F’-ratios is superfluous, since the reduction would 
have to be considerable before significance was marginal. 

To examine the semiannual oscillations, equation (2), for k = 2, was 
fitted to each of the six sets of surface temperature data. The results are 
summarized in tables 4, 5, and 6. 


TABLE 4. REGRESSION COEFFICIENTS OF ANNUAL AND 
SEMIANNUAL OSCILLATIONS 


Location Regression Coefficients 


PAPA 


ECHO 


Cape St. James 


Triple Island 


Langara Island 


Seripps Pier 


TABLE 5. AMPLITUDES AND PHASES OF ANNUAL AND 
SEMIANNUAL OSCILLATIONS 


Location Amplitude, °C Phase, days 


PAPA 


ECHO = Tall 
Cape St. James - 8.0 
Triple Island -25.9 
Langara Island 23.0 
Seripps Pier - 7.8 
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TABLE 6. STATISTICAL FIT OF ANNUAL AND SEMIANNUAL 
OSCILLATIONS. F-RATIOS ATTRIBUTABLE TO SEMIANNUAL OSCILLATION 


Location 


PAPA 


ECHO 


Cape St. James 


Triple Island 


Langara Island 


Scripps Pier 


*Adjusted for nonrandom missing data. 


The addition of semiannual oscillatory terms to the regression equation 
improves the fit obtained with the annual terms. Except for Langara Island, the 
F-ratios are still significant at the 1 percent level, but are much smaller than 
for annual terms. Because of the marginal significance for Scripps Pier, the 
question of whether the assumptions about the residuals are satisfied is more 
pertinent. 


TREND ANALYSIS 


If the surface temperature for each year could be represented by a single 
quantity, it might be possible to identify a trend (gradual change in the system) 
over a period of several years. The regression analysis of the time-series pro- 
vides in f, an estimate of the yearly average of the surface temperature. Figure 
6 is a plot of the B.’s obtained by fitting equation (2) to each calendar year of 
data taken at four locations. 

The statistical method chosen to test for trends is that of the theory of 
runs.® The test is one for randomness in a sequence of observations. The only 
underlying assumption is that the variable under consideration be continuous. 
The test is performed as follows. The median value for each sequence of B,’s 
is determined. Each f, is assigned the letter A if it is above the median or the 
letter B if it is below the median, the median f, being omitted. A run is defined 
as a succession of one or more identical letters. The following sequences of runs 
are obtained. 
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Figure 6. Annual averages, [o, of sea-surface temperatures for coastal and island stations. 
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Scripps Pier: 


BB A B AAA B AAA BB AB AA B AAAAA BBB A BBBBBB A BB AAA 
B (19 runs) 


Langara Island: 

AAAA BB A BBBBB A BBB AAAA (7 runs) 
Triple Island: 

AAA B AA BBBBBB A BBB AAAA (7 runs) 
Cape St. James: 


AA BB AAAA BBBBBB A BB AAA (7 runs) 


If the B,’s are randomly distributed with respect to time, a fairly large 
number of runs is expected. If a trend exists in a sequence of f,’s, only a few 
runs are expected. The theoretical distribution and the critical values of the 
number of runs can be determined.? ‘The critical number of runs at the 5 percent 
probability level for the 40 observations at Scripps Pier is 15, the null hypothesis 
of no trend being rejected if there are 15 or fewer. Since the observed number of 
runs is 19, the null hypothesis is not rejected, and it is concluded that no long- 
term trend exists in the f,’s for the 40 years of data for Scripps Pier. 

Langara Island, Triple Island, and Cape St. James each have 7 runs in 20 
observations, the median {, being omitted for Triple Island. The probability of 7 
or fewer runs arising by chance in 20 observations is 0.051, so the 7 runs are not 
quite significant at the 5 percent level. It is concluded that no trend exists in 
the records for any of the three locations. 

It should be pointed out that once a time series of 20 to 40 years in 
length is selected for analysis, runs as long as 5 or 6 years, among those ob- 
served, are reasonable and expected. For example, a slightly different test of 
hypothesis using runs is based on the length of the longest run.’ For the longest 
run to be significant at the 5 percent level, it must be at least length 7 in 20 
observations or length 9 in 40 observations. The longest runs obtained in this 
analysis are of length 6, and are not significant for either series length. 

An alternate test for randomness is the autocorrelation coefficient with 
lag 1, or more simply the statistic 


Ws » XxX, Xie 
f= 


If a set of observations is ordered with respect to time, and if time is irrelevant, 
no correlation would be expected to exist between successive pairs of values of 


the sequence. A nonparametric test has been devised to test the hypothesis of 
zero autocorrelation.!° The random variable W is approximately normally distri- 
buted with mean 


and variance 


G2o8,  SosdS25,4 a9, G45, 2S, 2 
SIU) ea ome (N-1) (N-2) Tack 


where 


The X’s are the sequence of 8,’s. For Scripps Pier, N = 40, and 
X,, = X,. Calculations yield W = 149.184, My = 145.850, and ow = 2.25, so that 
t = (W-My)/ow = 1.48. For t a standard normal variable, the 5 percent critical 
value is 1.64, since the alternative hypothesis is W>My. Since 1.48<1.64, the 
null hypothesis W = My is not rejected, and again it is concluded that no trend 
exists in the sequence of {,’s. 

Since no trend was detected in the Scripps Pier data, it is of interest to 
go one step further into the question of randomness and examine the empirical 
distribution of certain statistics obtained from the analysis of the 40 individual 
years. Figure 7A is a histogram of the same set of 8,’s that were tested for 
trend. The normal curve with the sample mean of 16.912 and sample standard 
deviation of 0.613 is also shown in the figure. Even though the histogram is 
skewed, a chi-square, goodness-of-fit test leads to an acceptance of normality 
at the 5 percent probability level. 

The purpose of this study is not one of making goodness-of-fit tests, 
and no further use is made of this technique. Rather it is included to point out 
that, on the basis of tests for trend and the histogram above, quantities such as 
Bo used to characterize sea-surface temperatures for an entire year behave 
exactly as one expects independent random variables to behave. This is not to 
deny the existence of real year-to-year differences in the ocean, but rather to 
emphasize that these differences are not unexpected to an oceanometrician. 


As a test for year-to-year differences in the B,’s, table 7 displays an 


analysis of variance for each station. The between-years sum of squares 


N 


1 


N 
is » (Bo—Bo)?; the within-years sum of squares is ». op?. The quantities 
1 


Bo and op? are individual year values and the summations are over the N years 
ina sample. Since the off-diagonal terms in the product-moment matrix are 


negligible, an individual year op? divided by the appropriate sample size is an 


estimate of the variance of f, for that year. (A discussion of the variance of 


regression coefficients can be found in reference 11.) The F-ratios are all highly 


significant, and it is concluded that there are real year-to-year differences in the 


Bouse 


TABLE 7. ANALYSIS OF VARIANCE OF ANNUAL £6, 


Location 


Cape St. James 


Between Years 


n 
o 
onl 
ioe) 
iS} 
joy 
op) 
Se 
° 
< 
=| 
=} 
DM 


@& |Degrees of Freedom 


Within Years 


F-Ratio 


for) 
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Similar analyses of randomness have been applied to the Scripps Pier 


40-year time-series of annual amplitude, annual phase, and percent variance 


explained. Histograms are shown in figures 7B, C, and D. In all cases the histo- 


grams are typical of those obtained from a sample of size 40 from a population 


where the variable in question has a unimodal, slightly skewed frequency func- 
tion. In addition the run tests for trend yield total runs of 18, 23, and 22 for 
amplitude, phase, and percent variance explained, respectively. All these totals 


are greater than the still applicable critical value of 15 used for the B,’s, so 


again it is concluded that no trend exists in the time-series of these variables. 
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Figure 7. Histograms for annual Scripps Pier data of average temperature, 


amplitude, phase, and percent variance explained. 


SUMMARY AND CONCLUSIONS 


By alternating an autocorrelation analysis and an harmonic analysis of 
certain time-series of daily sea-surface temperatures, an adequate estimator of 
these temperatures has been determined. A judicious selection of regression 
variables would have made the autocorrelation analysis unnecessary, but such an 
analysis does provide clues to the nature of the time-series. 

For all stations considered, a regression model containing annual and 
semiannual oscillatory terms (sines and cosines) provides a good statistical fit 
to the observed daily temperatures. Some of the stations have nonrandom mis- 
sing data, generally in the winter. Reference 1 examines in detail the effect of 
this type data on the variances of regression coefficients and autocorrelation co- 
efficients. These results are extended to the residual variances used in this 
report. The correction for nonrandom missing data increases the residual vari- 
ance. Correspondingly, F-ratios are reduced. The effect is conservative. That 
is, one is less likely to reject null hypotheses after the correction is made than 
before. 

The question of the existence of trends in ocean temperatures is an impor- 
tant one. Several statistical tests for trend were performed on the sequences of 
annual mean sea-surface temperatures, and on the sequences of amplitudes and 
phases describing the regression functions. No trends were discovered to exist in 
any of the sequences. It should be pointed out that, if trends did exist, it would 
be a straightforward statistical problem to isolate their effect on the time-series. 


RECOMMENDATIONS 


1. An analysis of variance of the annual mean sea-surface temperatures 
indicates that there are real year-to-year differences in the means. In the light 
of these differences it is recommended that an investigation be made into the 
length of time-series necessary to produce reliable long-term estimates of sea- 
surface temperatures. 


2. The yearly means behave like random variables. Shorter sequences 
of means with a systematic or cyclical appearance may occur by chance. It is 
recommended that an investigation be made into the frequency of occurrence of 
such unusual short sequences. 
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